Custom Integration with Target as Azure Data Lake

Calibo Accelerate supports custom integration using various data sources with Databricks for integration, ingesting the data into an Azure Data Lake. For the sake of an example we are using the following combination:

RDMBS - MySQL (as a data source) > Databricks (for custom integration) > ADLS (as a data lake)

The following combinations of source and target are supported for Databricks custom integration:

Source Custom Integration Data Lake
RDBMS - MySQL Databricks Azure Data Lake
RDBMS - Oracle Server Databricks Azure Data Lake
RDBMS - Microsoft SQL Server Databricks Azure Data Lake
RDBMS - PostgreSQL Databricks Azure Data Lake
RDBMS - Snowflake Databricks Azure Data Lake
FTP Databricks Azure Data Lake
SFTP Databricks Azure Data Lake
Azure Data Lake (File Selection) Databricks Azure Data Lake
Azure Data Lake (Folder Selection) Databricks Azure Data Lake
Azure Data Lake Databricks Unity Catalog

 

To create a Databricks custom integration job

  1. Sign in to Calibo Accelerate and navigate to Products.

  2. Select a product and feature. Click the Develop stage of the feature.

  3. Click Data Pipeline Studio.

  4. Create a pipeline with the following stages and add the required technologies:

    Custom Integration using Databricks with source RDMBS MySQL to target ADLS

  5. Configure the source and target nodes.

  6. Click the data integration node and click Create Custom Job.

    Create Custom Job for MySQL to ADLS using Databricks

  7. Complete the following steps to create the Databricks custom integration job:

  1. After you create the job, click the job name to navigate to Databricks Notebook.

To add custom code to your integration job

After you have created the custom integration job, click the Databricks Notebook icon.

Databricks Notebook for custom code

This navigates you to the custom integration job in the Databricks UI. Review the sample code and replace it with your custom code.

To run a custom integration job

You can run the custom integration job in one of the following ways:

  • Navigate to Calibo Accelerate and click the Databricks node. Click Start to initiate the job run.

  • Publish the pipeline and then click Run Pipeline.

Related Topics Link IconRecommended Topics What's next? Snowflake Custom Transformation Job